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ABSTRACT 



Using frequency and time masking effects to encode an 
audio signal The frequency band of the digital audio signal 
is divided into a number of subbands. First signal-to-mask 
ratios for the respective subbands are estimated in response 
to digital signal samples in each subband of the ith frame 
included in the audio signal. The first signal-to-mask ratio 
for the ith frame are stored for a predetermined time period 
and delayed signal-to-mask ratios for the (i-l)st frame are 
prestored and synchronized with the first signal-to-mask 
ratios. Second signal-to-mask ratios are based on the first 
signal-to-mask ratios and the delayed signal-to- mask ratios. 
Adaptive bit allocations for each of the subbands are based 
on the second signal-to-mask ratios Digital signal samples 
are quantized in each subband in response to the generated 
bit allocation information for each of the subbands. Finally, 
the quantized digital signal samples are formatted using the 
generated bit allocation information. 

3 Claims, 1 Drawing Sheet 
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PREDICTIVE TECHNIQUE FOR SIGNAL TO 
MASK RATIO CALCULATIONS 

FIELD OF THE INVENTION 

The present invention relates to a method for encoding a 
digital audio signal; and, more particularly, to an improved 
method for encoding a digital audio signal comprising a 
plurality of frames based on a human auditory perception 
with respect to frequency and time masking effects. 

DESCRIPTION OF THE PRIOR ART 

Transmission of digitized audio signals makes it possible 
to deliver high quality audio signals comparable to those of 
a compact disc and/or a digital audio tape. When an audio 
signal is expressed in a digital form, a substantial amount of 
data need be transmitted especially in the case of a high 
definition television system. Since, however, the available 
frequency bandwidth assigned to such digital audio signals 
is limited, in order to transmit the substantial amounts of 
digital data, e.g., 768 Kbits per second for 16 bit PCM (Pulse 
Code Modulation) audio signal with 48 KHz sampling 
frequency, through the limited audio bandwidth of, e.g., 
about 128 KHz, it becomes inevitable to compress the digital 
audio data. 

Among the various audio compression devices or 
techniques, the so-called MPEG (Moving Pictures Expert 
Group)Audio Algorithm, which employs a psychoacoustic 
algorithm, has been suggested for HDTV applications. 

In an audio encoding system which adopts the above 
MPEG audio technique four primary parts, i.e., subband 
filtering* psychoacoustic modeling, quantizing and coding, 
and frame formatting, are employed to compress the digital 
audio data. The subband filtering is a process of mapping, 
from the time domain to the frequency domain, an input 
PCM digital audio signal A filterbank with B (e.g., 32) 
subbands may be used. In each subband 12 or 36 samples are 
grouped for the processing thereof; and the grouped samples 
from said B subbands, Le. ( Nxl2 or 36 constitute a "frame**, 
which is a processing unit for the encoding, transmission and 
decoding of audio signals. The psychoacoustic modeling 
creates a set of data, e.g., SMR (signal-to-mask ratio) data, 
for each subband or group of subbands through the use of a 
frequency masking effect, to thereby control the quantizing 
and coding thereof, wherein the frequency masking effect 
represents an increase in the audible limit or threshold of 
audibility for a sound caused by the presence of another (Le., 
masking) contemporary sound in the frequency domain. 
Available bits are, then, adaptively allocated to each sub- 
band of a frame with reference to the SMR in the process of 
quantizing and coding the subband samples. A frame for- 
matter formats the frame data together with other required 
side information in a suitable fashion for transmission. 

Even though this technique may enhance a coding effi- 
ciency through the use of the frequency masking effect, it is 
incapable of reflecting a time masking effect representative 
of a phenomenon wherein the audible limit or threshold of 
audibility for a sound is raised due to the presence of another 
temporally adjacent sound in the time domain, thus unable 
to provide an audio signal encoding which fully improves 
the coding efficiency. 

SUMMARY; OF THE INVENTION 

B is, therefore, a primary object of the present invention 
to provide an unproved method for encoding a digital audio 
signal comprising a plurality of frames based on frequency 



7,721 

2 

and time mAxlring effects, thereby enhancing the coding 
efficiency thereof. 

In accordance with the present invention, there is pro- 
vided a method for adaptively encoding a digitally sampled 

5 audio signal including a plurality of frames, which com- 
prises the steps of: (a) dividing the frequency band of the 
digital audio signal into a number of B subbands, wherein B 
is an integer greater than 1 and the bandwidth s of the 
subbands substantially correspond to bandwidths which are 

10 critical to the human auditory system; (b) estimating first 
signal-to-mask ratio for the respective subbands in response 
to the digital signal samples in each subband for an ith frame 
included in the digital audio signal, i being a frame index; (c) 
storing the first signal-to-mask ratios for the ith frame for a 

is predetermined time period and generating delayed signal- 
to-mask ratios for the (i-l)st frame prestored therein syn- 
chronized with the first signal-to-mask ratios; (d) providing 
second signal-to-mask ratios based on the first signai-to- 
mask ratios and the delayed signal-to-mask ratios; (e) adap- 

20 rively determining bits for each of the subbands based on die 
second signal-to-mask ratios, and for generating bit alloca- 
tion information corresponding to the determined bits for 
each of the subbands; (f) quantizing the digital signal 
samples in each subband in response to the generated bit 

25 allocation information for each of the subbands; and (g) 
formatting the quantized digital signal samples together with 
the generated bit allocation information. 

BRIEF DESCRIPTION OF THE DRAWING 
30 The above and other objects and features of the present 
invention will become apparent from the following descrip- 
tion given with reference to the accompanying drawing, 
which is a block diagram schematically illustrating an 
apparatus for encoding an input digital audio signal in 
35 accordance with the present invention. 

DETAILED DESCRIPTION OF THE 
PREFERRED EMBODIMENTS 

Referring to the drawing, there is shown a block diagram 
^ schematically illustrating an apparatus for encoding a digital 
audio signal in accordance with the present invention. 

Hie digital audio encoding apparatus 100 comprises a 
subband filtering block U0, first and second perceptual 
parameter estimators 120 and 140, a delay circuit 130, a bit 
45 allocation and quantization block 150, and a formatting 
circuit 16#. 

A digitally sampled input audio signal X(n) of an ith 
frame, which includes N samples, Le., n=0, 1, . . . , N-l, is 
applied to the first perceptual parameter estimator 120 and 

so the subband filtering block 110 which is adapted to perform 
a subband filtering operation of the input digital audio 
signal, wherein N is a positive integer. A "frame" used herein 
denotes a part of the digital audio signal which corresponds 
to a fixed number of audio samples and is a processing unit 

55 for the encoding and decoding of the digital audio signal. 
The subband filtering block 110 receives the input digital 
audio signal of the ith frame and divides the frequency band 
of the input digital audio signal into a number of B, e.g., 32, 
subbands by employing a subband filtering technique well 

60 known in the art, eg., the method disclosed in the so-called 
MPEG Audio Algorithm described in ISO/I EC JTCI/SC2J 
WG 11, 'Tart 3, Audio Proposal", CD-11172-3(1991), 
wherein the bandwidths of the subbands substantially cor- 
respond to bandwidths which are critical to the human 

65 auditory system. The digital signal samples in each of the 
subbands are then provided from the subband filtering block 
110 to the bit allocation and quantization block 150. 



01/15/2004, EAST version: 1.4.1 



5,737,721 



On the other hand, the first perceptual parameter estimator 
120 receives the digitally sampled input audio signal of the 
ith frame and estimates first signal-to- mask ratios for the ith 
frame by using a psychoacoustic model, e.g., the one dis- 
cussed in the MPEG Audio Algorithm supra. The first 
signal-to-mask ratio for each subband of the ith frame, 
which is well known in the art, may be derived as follows: 



wherein i is a frame index; j. a subband index with j=0, 1, 
. . . 3-1, B being the total number of subbands in a frame; 
SMRjG. i). a first signal-to-mask ratio in subband j of the ith 
frame; P(j, i), a sound pressure level in subband j of the ith 
frame estimated from a FFT (Fast Fourier Transform) tech- 
nique; M(j, i). a frequency masking threshold in subband j 
of the ith frame; and said SMRjO* i), P(j, i) and M(j< i) are 
all in a dB (decibel) unit. 

The frequency masking threshold represents an audible 
limit which is a sum of the intrinsic audible limit or 
threshold of a sound and an increment caused by the 
presence of other tonal and non-tonal components of the 
audio signal. The first signal-to-mask ratios for the ith frame 
are then fed to the delay circuit 130 and the second percep- 
tual parameter estimator 140. 

In the delay circuit 130, the first signal-to-mask ratios for 
the ith frame are stared in a memory (not shown) thereof and 
delayed for a predetermined time period; and delayed signal- 
to-mask ratios for the (i-l)st frame prestored in the memory 
are provided to the second perceptual parameter estimator 
140 synchronized with the first signal-to-mask ratios applied 
thereto. The delay circuit 130 can be easily implemented by 
employing general electronic circuitries well known in the 
art The predetermined time period, Le., delay time of the 
delay circuit 130, is determined by taking into account the 
time masking effect representative of a phenomenon 
wherein the audible limit or threshold of audibility for a 
sound is raised due to the presence of another temporally 
adjacent sound in the time domain. In a preferred embodi- 
ment of the present invention, the predetermined delay time 
is equivalent to one frame processing time of the digital 
audio signaL The delayed signal-to-mask ratios for the 
(1— l)st frame and the first signal-to-mask ratios for the ith 
frame are simultaneously fed to the second perceptual 
parameter estimator 140 which calculates second signal-to- 
mask ratios for the ith frame as follows: 
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SMRJJ, C)=MlN[kxI>SM!t t (j, f-l), SMR^ i)) 



Bq. (2) 



wherein SMRjQ, i), j and i have the same meanings as 50 
previously defined; SMR^j, i), a second signal-to-mask 
ratio in subband j of the ith frame; DSMR A (j, i-1), a delayed 
signal-to-mask ratio in subband j of the (i-l)st frame; and k, 
a constant larger man 0 and smaller than 1. 

In the preferred embodiment of the invention, the constant 55 
value k can be determined based on the time masking effect 
of the human auditory perception, and is preferably set to 
0.5, said value 0.5 being an ap p rop riate value to reflect the 
time masking effect. 

The second signal-to-mask ratio for each subband of the 60 
ith frame from the second perceptual parameter estimator 
140 is then provided to the bit allocation and quantization 
block 150. In the bit allocation and quantization block 150, 
bits for each of the subbands are adaptivery determined 
based on the second signal-to-mask ratio for each of the 65 
subbands of the ith frame and bit allocation information 



corresponding to the determined bits fox each of the sub- 
bands is generated Thereafter, the digital signal samples in 
each subband are quantized in response to the generated bit 
allocation information for each of the subbands and the 
quantized digital signal samples for each subband of the ith 
frame and the bit allocation information are simultaneously 
provided to the formatting circuit 160. At the formatting 
circuit 160, the quantized digital signal samples and the bit 
allocation information from the bit allocation and quantiza- 
tion block 150 are formatted and transmitted to a transmitter 
(not shown) for the transmission thereof. The principles and 
functions of the bit allocation and quantization block 150 
and the formatting circuit 160 are basically identical to those 
which can be found in the MPEG Audio Algorithm. 

While the present invention has been shown and 
described with reference to the particular embodiments, it 
will be apparent to those skilled in the art that many changes 
and modifications may be made without departing from the 
spirit and scope of the invention as defined in the appended 
claims. 

What is claimed is: 

1. A method for adaptively encoding a digitally sampled 
audio signal including a plurality of frames, which com- 
prises the steps of: 

(a) dividing the frequency band of the digital audio signal 
into a Dumber of P subbands, wherein said P is an 
integer larger than 1 and the bandwidths of said sub- 
bands substantially correspond to bandwidths which 
are critical to a human auditory system; 

(b) estimating first signal-to-maskratios for the respective 
subbands in response to the digital signal samples in 
each subband included in the ith frame of the digital 
audio signal, said i being a frame index; 

(c) storing the first signal-to-mask ratios for the ith frame 
for a predetermined time period and generating delayed 
signal-to-mask ratios for the (i-l)st frame prestored 
therein synchronized with the first signal-to-mask 
ratios; 

(d) providing second signal-to-mask ratios based on the 
first signal-to-mask ratios and the delayed signal-to- 
mask ratios; 

(e) adaptively determining bits for each of the subbands 
based on the second signal-to-mask ratios, and for 
generating bit allocation information corresponding to 
the determined bits for each of the subbands; 

(f) quantizing the digital signal samples in each subband 
in response to the generated bit allocation information 
for each of the subbands; and 

(g) formatting the quantized digital signal samples 
together with the generated bit allocation Information. 

2. The method as recited in claim 1, wherein the second 
signal-to-mask ratio in subband j of the ith frame, SMR^, 
i), is determined as: 

SMR& i^AflMJxDSMJJ^ MX SMR& i)] 

wherein j is a subband index j=0, 1, . . . , P-l, P being the 
total number of subbands in a frame; u a frame index; 
DSMRjO. i-1), a delayed signal-to-mask ratio in subband j 
of the (i-l)st frame; SMRi(j, i) ( a signal-to-mask ratio in 
subband j of the ith frame; and k, a constant larger than 0 and 
smaller than 1. 

3. The method as recited In claim 2, wherein the constant 
k is 0.5. 
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